联合学习(FL)可以培训全球模型,而无需共享存储在多个设备上的分散的原始数据以保护数据隐私。由于设备的能力多样化,FL框架难以解决Straggler效应和过时模型的问题。此外,数据异质性在FL训练过程中会导致全球模型的严重准确性降解。为了解决上述问题,我们提出了一个层次同步FL框架,即Fedhisyn。 Fedhisyn首先根据其计算能力将所有可​​用的设备簇分为少数类别。经过一定的本地培训间隔后,将不同类别培训的模型同时上传到中央服务器。在单个类别中,设备根据环形拓扑会相互传达局部更新的模型权重。随着环形拓扑中训练的效率更喜欢具有均匀资源的设备,基于计算能力的分类减轻了Straggler效应的影响。此外,多个类别的同步更新与单个类别中的设备通信的组合有助于解决数据异质性问题,同时达到高精度。我们评估了基于MNIST,EMNIST,CIFAR10和CIFAR100数据集的提议框架以及设备的不同异质设置。实验结果表明,在训练准确性和效率方面,Fedhisyn的表现优于六种基线方法,例如FedAvg,脚手架和Fedat。
translated by 谷歌翻译
(源)代码摘要旨在以自然语言的形式自动为给定代码段生成摘要/注释。此类摘要在帮助开发人员理解和维护源代码方面起着关键作用。现有的代码摘要技术可以分类为提取方法和抽象方法。提取方法使用检索技术从代码段中提取重要语句和关键字的子集,并生成一个摘要,该摘要保留了重要语句和关键字中的事实详细信息。但是,这样的子集可能会错过标识符或实体命名,因此,产生的摘要的自然性通常很差。抽象方法可以生成类似人写的摘要,从而利用神经机器翻译域的编码器模型。然而,生成的摘要通常会错过重要的事实细节。为了通过保留的事实细节生成类似人写的摘要,我们提出了一个新颖的提取和吸收框架。框架中的提取模块执行了提取代码摘要的任务,该任务列入了代码段,并预测包含关键事实细节的重要陈述。框架中的抽象模块执行了抽象代码摘要的任务,该任务是在整个代码段和并行的重要陈述中进行的,并生成了简洁而人工写的类似的自然语言摘要。我们通过在涉及六种编程语言的三个数据集上进行广泛的实验来评估称为EACS的有效性。实验结果表明,在所有三种广泛使用的指标(包括BLEU,流星和Rough-l)方面,EACS明显优于最先进的技术。
translated by 谷歌翻译
Kernels are efficient in representing nonlocal dependence and they are widely used to design operators between function spaces. Thus, learning kernels in operators from data is an inverse problem of general interest. Due to the nonlocal dependence, the inverse problem can be severely ill-posed with a data-dependent singular inversion operator. The Bayesian approach overcomes the ill-posedness through a non-degenerate prior. However, a fixed non-degenerate prior leads to a divergent posterior mean when the observation noise becomes small, if the data induces a perturbation in the eigenspace of zero eigenvalues of the inversion operator. We introduce a data-adaptive prior to achieve a stable posterior whose mean always has a small noise limit. The data-adaptive prior's covariance is the inversion operator with a hyper-parameter selected adaptive to data by the L-curve method. Furthermore, we provide a detailed analysis on the computational practice of the data-adaptive prior, and demonstrate it on Toeplitz matrices and integral operators. Numerical tests show that a fixed prior can lead to a divergent posterior mean in the presence of any of the four types of errors: discretization error, model error, partial observation and wrong noise assumption. In contrast, the data-adaptive prior always attains posterior means with small noise limits.
translated by 谷歌翻译
Spatio-temporal modeling as a canonical task of multivariate time series forecasting has been a significant research topic in AI community. To address the underlying heterogeneity and non-stationarity implied in the graph streams, in this study, we propose Spatio-Temporal Meta-Graph Learning as a novel Graph Structure Learning mechanism on spatio-temporal data. Specifically, we implement this idea into Meta-Graph Convolutional Recurrent Network (MegaCRN) by plugging the Meta-Graph Learner powered by a Meta-Node Bank into GCRN encoder-decoder. We conduct a comprehensive evaluation on two benchmark datasets (METR-LA and PEMS-BAY) and a large-scale spatio-temporal dataset that contains a variaty of non-stationary phenomena. Our model outperformed the state-of-the-arts to a large degree on all three datasets (over 27% MAE and 34% RMSE). Besides, through a series of qualitative evaluations, we demonstrate that our model can explicitly disentangle locations and time slots with different patterns and be robustly adaptive to different anomalous situations. Codes and datasets are available at https://github.com/deepkashiwa20/MegaCRN.
translated by 谷歌翻译
Traffic forecasting as a canonical task of multivariate time series forecasting has been a significant research topic in AI community. To address the spatio-temporal heterogeneity and non-stationarity implied in the traffic stream, in this study, we propose Spatio-Temporal Meta-Graph Learning as a novel Graph Structure Learning mechanism on spatio-temporal data. Specifically, we implement this idea into Meta-Graph Convolutional Recurrent Network (MegaCRN) by plugging the Meta-Graph Learner powered by a Meta-Node Bank into GCRN encoder-decoder. We conduct a comprehensive evaluation on two benchmark datasets (METR-LA and PEMS-BAY) and a new large-scale traffic speed dataset in which traffic incident information is contained. Our model outperformed the state-of-the-arts to a large degree on all three datasets (over 27% MAE and 34% RMSE). Besides, through a series of qualitative evaluations, we demonstrate that our model can explicitly disentangle the road links and time slots with different patterns and be robustly adaptive to any anomalous traffic situations. Codes and datasets are available at https://github.com/deepkashiwa20/MegaCRN.
translated by 谷歌翻译
我们为相互作用粒子系统的平均场方程中相互作用内核的可识别性提供了完整的表征。关键是识别概率二次损耗功能具有独特的最小化器的功能空间。我们考虑两个数据自适应$ l^2 $空间,一个带有Lebesgue度量,另一个具有均值固有的探索度量。对于每个$ l^2 $空间,损耗功能的Fr \'echet导数会导致半阳性的积分运算符,因此,可识别性在集成运算符的非零特征值和功能空间的特征空间上保留在特征空间上识别是与积分运算符相关的RKHS的$ l^2 $ clublosure。此外,仅当整体操作员严格呈正时,可识别性在$ l^2 $空间上。因此,逆问题是错误的,需要正则化。在截断的SVD正则化的背景下,我们从数值上证明了加权$ l^2 $空间比未加权的$ l^2 $空间更可取,因为它会导致更准确的正则化估计器。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译